Monocular depth estimation is a challenging task in complex compositionsdepicting multiple objects of diverse scales. Albeit the recent great progressthanks to the deep convolutional neural networks (CNNs), the state-of-the-artmonocular depth estimation methods still fall short to handle such real-worldchallenging scenarios. In this paper, we propose a deep end-to-end learningframework to tackle these challenges, which learns the direct mapping from acolor image to the corresponding depth map. First, we represent monocular depthestimation as a multi-category dense labeling task by contrast to theregression based formulation. In this way, we could build upon the recentprogress in dense labeling such as semantic segmentation. Second, we fusedifferent side-outputs from our front-end dilated convolutional neural networkin a hierarchical way to exploit the multi-scale depth cues for depthestimation, which is critical to achieve scale-aware depth estimation. Third,we propose to utilize soft-weighted-sum inference instead of the hard-maxinference, transforming the discretized depth score to continuous depth value.Thus, we reduce the influence of quantization error and improve the robustnessof our method. Extensive experiments on the NYU Depth V2 and KITTI datasetsshow the superiority of our method compared with current state-of-the-artmethods. Furthermore, experiments on the NYU V2 dataset reveal that our modelis able to learn the probability distribution of depth.
展开▼